Expertise modelling

ABSTRACT

A method of ranking experts in a subject matter field in an expertise model by selecting documents from the set of documents that refer to the subject to create a subject related subset of documents, selecting extracts from the subset of documents that refer to the subject and then analysing the linguistic structure of the extracts.

This invention relates to methods of expertise modelling and moreparticularly to methods of ranking experts in a subject matter field.

In large and/or multi-site based organisations it is difficult toutilise the expertise of individuals to the best advantage of theorganisation. Thus, for example, one part of an organisation may“reinvent the wheel” because they are not aware of work carried out someyears previous or indeed concurrently by another part of anorganisation. Another common example of where organisations do not makebest use of individuals' knowledge is where another individual withinthe organisation needs help in a particular area in which they are not“expert” or in other words they are a novice. Often the best solution isto find someone else within the organisation with the relevantexpertise, namely an expert who can answer the novice's questions.However, often novices have difficulty characterising their ownquestions and expertise and this hinders their search for an expert toassist them.

To assist organisations make better use of individuals' knowledge ExpertFinder systems have been developed. An Expert Finder is a systemdesigned to locate people who have “sought-after knowledge” to solve aspecific problem. It provides the names of potential helpers againstknowledge seeking queries, in order to establish personal contacts whichlink novices to experts. The ultimate goal of such a system is to createenvironments where users are aware of each other, maximising theircurrent resources and actively exchanging up-to-date information.Although the expert finder systems cannot always generate correctanswers, bringing the relevant people together provides opportunitiesfor them to become aware of each other, and to have further discussions,which may uncover hidden expertise.

Not only do Expert Finders help to effectively manage the usefulknowledge held by individuals and thus supplement additional resources,but it also contributes timely and up-to-date procedural and factualknowledge to enterprises. In order to fully maximise individually heldresources, it is necessary to encourage people to share such valuabledata. To enable such data to be utilised to its maximum potential itimportant that the collection and management of the data does notinterfere with an individual's everyday tasks or place onerousobligations on individuals. Thus collection and management must be“invisible” to the individual until their assistance is required. Asexpertise is accumulated through task achievement, it is also importantto exploit it as it is created. To achieve this an automated system thatdoes not rely on the individual is required. Such an approach allowsindividuals to work as normal without demanding changes in workingenvironments.

Expert Finders exploit already existing data banks such as e-mailcommunications to capture personal expertise while allowing users towork as they normally would do without changing the working environment.E-mail communications are an ideal data bank for Expert Finders toexploit because e-mail communication has become a major means ofexchanging information and acquiring social or organisationalrelationships, thus it can be a good source of information about recentand useful co-operative activities among users. In addition, as itrepresents an everyday activity, it requires no major changes to workingenvironment.

Other data banks, such as an electronic library of reports, minutes ofmeetings or transcripts of telephone conversations may be used.

User profiles are created to decide whether an individual is an expertfor a given problem. The standard method of creating user profiles isbased on a statistical approach. The frequency of keywords in documentsand the number of documents a user has created containing the keywords,are used to rank users for different subjects, creating user profiles.User profiles may also contain rankings for other factors, such as“helpfulness”, that is how willing they are to assist other users whencontacted by counting the number of responses to queries and the speedof responses.

KnowledgeMail™ from Tacit Knowledge Systems Inc.(www.tacit.com./knowledgemail) adds an automatic profiling ability tosome of the existing commercial e-mail systems, to support informationsharing through executing queries about the profiles constructed. Userprofiles are formulated as a list of weight-valued terms by using astatistical method. A survey focusing on the system's performancereveals that users tend to spend extra time cleaning up their profilesin order to reduce false hits, which erroneously recommend them asexperts due to unresolved ambiguous terms.

Maybury, M., D'Amore, R., House, D. (2001) Automated Discovery andMapping of Expertise, developed an Expert Finder system that exploitsthe intellectual products created within an organisation to supportautomated expertise identification. The system considered a user as anexpert if he/she was linked to a wide range of documents and/or a largenumber of documents about that topic. It combines multiple evidencedemonstrating associations with the user in determining the level ofexpertise of the user. This qualifies experts by requiring detailedevidence, however, such evidence is collected from the measurement ofinformation usage patterns, rather than from the analysis of themeanings and functional roles of such information.

However such a statistical approach has severe drawbacks including;

-   -   counting keywords is not adequate for determining whether a        given document is factual information or contains some level of        author expertise.    -   without understanding the semantic meanings of keywords, it is        possible to assume that different words represent the same        concept and vice versa, which triggers the retrieval of        non-relevant information.    -   it is not easy to distinguish question-type texts from potential        answer documents, meaning asking a question about a subject will        improve a user's profile even though it may mean the user has        little knowledge on a subject which is why they are asking the        question.

It is an object of the present invention to provide a different methodof creating user profiles and expert rankings, providing more meaningfuluser profiles.

A first aspect of the present invention provides a method for rankingcreators of a set of documents in order of their expertise in a subjectincluding the steps of:

-   -   selecting documents from the set of documents that refer to the        subject to create a subject related subset of documents;    -   selecting extracts from the subset of documents that refer to        the subject;    -   analysing the linguistic structure of the extracts;    -   using the analysis to rank the creators.

The step of analysing the linguistic structure of the extracts mayinclude:

-   -   isolating verbs in the extracts to create a set of verbs for        classification and,    -   classifying each isolated verb in the set of verbs according to        a predetermined hierarchy.

User expertise may be considered to be action-centred and oftendistributed in the individual's action-experiences and thus usinglinguistic modelling action-centred statements in the extracts can behighlighted and thus a more sophisticated analysis of sentences orextracts containing references to a subject in a document can be made,allowing expert rankings to be derived. With this approach, the extractsmay be regarded as the realisation of involved knowledge, user expertisecan be verbalised as a direct indication of user views on discussedsubjects, and the levels of expertise are distinguished by taking intoaccount the degree of significance of the words employed in theextracts.

The predetermined hierarchy may be created by:

-   -   mapping isolated verbs to an illocutionary verb in a predefined        set of illocutionary verbs and;    -   classifying the mapped isolated verbs according to the Speech        Act Theory category of the corresponding illocutionary verb.

Speech Act Theory (SAT) proposes that communication involves thespeaker's expression of an attitude (i.e. an illocutionary act) towardsthe contents of the communication. It suggests that information can bedelivered with different communication effects on recipients dependingon different speaker's attitudes, which are expressed using anappropriate illocutionary act, which represents a particular function ofcommunication. The performance of the speech act is described by a verb,which posits a core element as the central organiser of a sentence.

More verbs may be classified by:

-   -   filtering isolated verbs not having a predefined illocutionary        verb and thus not successfully mapped to the set of        illocutionary verbs and;    -   checking for synonyms of the unmapped isolated verbs, that have        a predefined illocutionary verb, and    -   classifying the each isolated verb not having a predefined        illocutionary verb in the same category as its synonym.

In order to increase the number of verbs covered by the predeterminedhierarchy a practical solution is to check for synonyms that haveillocutionary verbs in the predetermined hierarchy and classify theoriginal verb in the same way as the synonym with a illocutionary verbdefined.

Isolated verbs that are not classified may not be used for rankingpurposes and thus may be discarded.

Syntactical analysis can be used to isolate verbs by identifying thesyntactic roles of words in a sentence using a corpus annotation ApplePie Parser, which is a bottom-up probabilistic chart parser that findsthe parse tree with the best score by the best-first search algorithm.The sentence is decomposed into a group of grammatically relatedphrases, such as “noun”, “adverb”, “adjective”, “verb”, or“preposition”.

Weighting extracts to favour those written in the first person receiveover those written in the third person may also be used to furtherrefine the ranking process.

SAT says that the fact that working practices are reflected through taskachievement. Thus it can be considered that personal expertise can beregarded as action-oriented, emphasising the important role of a “firstperson” subject in expertise modelling.

Of course the extracts selected maybe single sentences.

According to a second aspect of the present invention there is provideda computer programme executable to rank creators of a set of documentsin order of their expertise in a subject utilising the method aspreviously described.

According to a third aspect of the present invention there is provided acomputer programmed to rank creators of a set of documents in order oftheir expertise in a subject according to the method as previouslydescribed.

According to a fourth aspect of the present invention there is provideda computer to rank creators of a set of documents in order of theirexpertise including means for:

-   -   selecting documents from the set of documents that refer to the        subject to create a subject related subset of documents;    -   selecting extracts from the subset of documents that refer to        the subject;    -   analysing the linguistic structure of the extracts; and    -   using the analysis to rank the creators.

According to a fifth aspect of the present invention there is provided asystem operable to rank creators of a set of documents in order of theirexpertise in a subject comprising the method as previously described.

By way of example only an embodiment of the invention will now bedescribed with reference to the accompanying figures in which:

FIG. 1 is a flow diagram outlining the procedure for using NaturalLanguage Processing-based user profiling;

FIG. 2 is a graph summarising the results a case study carried out totest that Expertise Modelling using Natural Language Processing producescomparable or higher accuracy in differentiating expertise from factualinformation compared to that of the frequency-based statistical model,and that differentiating expertise from factual information supportsmore effective query processing in locating the right experts; and

FIG. 3 is a graphical representation of the precision-recall of the samecase study as represented in FIG. 2.

An expertise model, EMNLP (Expertise Modelling using Natural LanguageProcessing) captures the different levels of expertise reflected inexchanged e-mail messages, and makes use of such expertise infacilitating a correct ranking of experts. A design objective of EMNLPis to improve the efficiency of the task search, which ranks peoples'names in decreasing order of expertise against a help-seeking query. Itscontribution is to turn once simply archived e-mail messages intoknowledge repositories by approaching them from a linguisticperspective, which regards the exchanged messages as the realization ofverbal communication among users. Its supporting assumption is that userexpertise is best extracted by focusing on the sentence where users'viewpoints are explicitly expressed. NLP is identified as an enablingtechnology that analyses e-mail messages with two aims; 1) to classifysentences into syntactical structures (syntactic analysis), and 2) toextract users' expertise levels using the functional roles of givensentences (semantic interpretation). FIG. 1 shows the procedure forusing EMNLP, i.e. how to create user profiles from the collectedmessages. Further details of the NLP components are explained within thedotted line. Contents are decomposed into a set of paragraphs andheuristics (e.g., locating a full stop) are applied in order to breakdown each paragraph into sentences.

Syntactical analysis identifies the syntactic roles of words in asentence by using a corpus annotation Apple Pie Parser, which is abottom-up probabilistic chart parser and finds the parse tree with thebest score by the best-first search algorithm. The syntactical analysissupports the location of a main verb in a sentence, by decomposing thesentence into a group of grammatically related phrases, such as “noun”,“adverb”, “adjective”, “verb”, or “preposition”.

Given the structural information about each sentence, semantic analysisexamines sentences with two criteria:

-   -   1) whether the employed verb verbalizes the speaker's attitudes,        and    -   2) whether the sentence has a “first person” (e.g., “I”, “In my        opinion”, or “We”) subject.

This analysis is based on Speech Act Theory (SAT), which proposes thatcommunication involves the speaker's expression of an attitude (i.e. anillocutionary act) towards the contents of the communication. Itsuggests that information can be delivered with different communicationeffects on recipients depending on different speaker's attitudes, whichare expressed using an appropriate illocutionary act, which represents aparticular function of communication. The performance of the speech actis described by a verb, which posits a core element as the centralorganiser of the sentence. In addition, the fact that working practicesare reflected through task achievement implies that personal expertisecan be regarded as action-oriented, emphasizing the important role of a“first person” subject in expertise modelling.

EMNLP extracts user expertise from the sentences, which have “firstperson” subjects, and determines expertise levels based on theidentified main verbs. Whereas SAT reasons about how differentillocutionary verbs convey the various intentions of speakers, NLPdetermines the intention by mapping the central verb in the sentence tothe pre-defined illocutionary verb. The decision about the level of userexpertise is made according to the defined hierarchies of the verbs,initially provided by SAT. SAT provides the categories of illocutionaryverbs (i.e. assertive, commissive, directive, declarative, andexpressive), each of which contains a set of exemplary verbs. EMNLPfurther extends the hierarchy in order to increase its coverage forpracticability by using the WordNet Database. EMNLP first examines allverbs occurring in the collected messages, and then filters out verbs,which have not been mapped onto the hierarchy. For each verb, itconsults the WordNet database in order to assign a value throughchaining its synonyms; for example, if the synonym of the given verb isclassified into “assertive” value, and then this verb is also assignedinto “assertive”.

To clarify how two sentences, that may be assumed to contain similarkeywords, are mapped onto different profiles, consider two examplesentences:

-   -   1) “For the 5049 testing, phase analysis on those high frequency        results that Rob plotted is needed”, and    -   2) “For the 5049 testing, I know we need phase analysis on those        high frequency results that Rob plotted”.

The main verb values for both sentences (i.e., need and know) areequivalent to “Strong Working Knowledge”, which conveys a relativelyhigh knowledge for a speaker. However, the difference is that whencompared to the first, the second sentence clearly conveys the speaker'sintention as it begins with “I know”. As a consequence, it is regardedas demonstrating expertise while the first sentence is not. Informationextracted from the first sentence is mapped onto a lower-levelexpertise.

A case study was developed to test two hypotheses; namely

-   -   1) that EMNLP produces comparable or higher accuracy in        differentiating expertise from factual information compared to        that of the frequency-based statistical model, and    -   2) that differentiating expertise from factual information        supports more effective query processing in locating the right        experts.

As a baseline, a frequency-based statistical model, which builds userprofiles by weighting presented terms without considering their meaningsor purposes was used.

A total of 10 users, who work for the same department in a professionalengineering design company, participated in the experiment and a periodof three-to-four months duration was spent collecting e-mail messages. Atotal of 18 queries was created for a testing dataset, and a maximumnumber of 40 names of predicted experts, i.e. 20 names extracted usingEMNLP and 20 names from the statistical model, were shown to a user, whowas the group leader of the other users. As a manager, the user was ableto evaluate the retrieved names according to the five pre-definedexpertise levels: “Expert-Level Knowledge”, “Strong Working Knowledge”,“Working Knowledge”, “Strong Working Interests” and “Working Interests”.

FIG. 2 summarizes the results measured by normalised precision. For 4questions, EMNLP produced lower performance rates than by using thestatistical approach. However, for 14 queries, its ranking results weremore accurate, and at the highest point, it outperformed the statisticalmethod with a 33% higher precision value. The precision-recall curve,which demonstrates a 23% higher precision value for EMNLP, is shown inFIG. 3. The differences of precision values at different recallthresholds are rather small with EMNLP, implying that its precisionvalues are relatively higher than those of the statistical model.

A close examination of the queries used for testing reveals that thestatistical model has a better capability in processing general-typequeries that search for non-specific factual information, since

-   -   1) as we regard user expertise as action-oriented, knowledge is        distinguished from such factual information, implying that it is        difficult to value factual information as knowledge with EMNLP,        and    -   2) EMNLP is limited to exploring various ways of determining the        level of expertise in that it constrains user expertise to be        expressed through the first person in a sentence.

EMNLP was developed to improve the accuracy of ranking the order ofexpert names by use of the NLP technique to capture explicitly stateduser expertise, which otherwise may be ignored. Its improved rankingorder, compared to that of a statistical method, was mainly due to theuse of an enriched expertise acquisition technique, which successfullydistinguished experienced users from novices. It is envisaged that EMNLPwould be particularly useful when applied to large organisations whereit is vital to improve retrieval performance since typical queries maybe answered with a list of a few hundred potential expert names.

Special attention is given to gathering domain specific terminologiespossibly collected from technical documents such as task manuals ormemos. This is particularly useful for the semantic analysis, whichidentifies concepts and relationships within the NLP framework, sincethese terminologies are not retrievable from general-purposedictionaries (e.g. the WordNet database).

It will be understood by the skilled reader that e-mail communication isjust one of a number examples of databases of information that could beused with an expert model system as described above. For example in aJava Programming domain, the system could model a user's programmingskill by reading source code files, and analysing what classes,libraries or methods are used and how often. This result is thencompared to the overall usage for the remaining users, to determine thelevels of expertise for specific topics (e.g., methods). Its automaticprofiling and mapping of five levels of expertise (i.e.,expert-advanced-intermediate-beginner-novice) in accordance with theprior art. However the system could be refined by assessing variouscoding patterns that might reveal the different skills of experts andbeginners in a similar way to the analysis of the linguistic structuredescribed above.

1. A method for ranking creators of a set of documents in order of theirexpertise in a subject including the steps of: selecting documents fromthe set of documents that refer to the subject to create a subjectrelated subset of documents; selecting extracts from the subset ofdocuments that refer to the subject; analyzing the linguistic structureof the extracts by isolating verbs in the extracts to create a set ofverbs for classification; classifying each isolated verb in the set ofverbs according to a predetermined hierarchy; and using the analysis torank the creators.
 2. A method for ranking creators of a set ofdocuments according to claim 1 including the further step of: creatingthe predetermined hierarchy by mapping isolated verbs to anillocutionary verb in a predefined set of illocutionary verbs and;classifying the mapped isolated verbs according to the Speech Act Theorycategory of the corresponding illocutionary verb.
 3. A method forranking creators of a set of documents according to claim 2 includingthe further step of: filtering isolated verbs not having a predefinedillocutionary verb and thus not successfully mapped to the set ofillocutionary verbs and; checking for synonyms of the unmapped isolatedverbs, that have a predefined illocutionary verb and; classifying theunmapped isolated verbs according to the Speech Act Theory of thecorresponding illocutionary verb of it synonym.
 5. A method for rankingcreators according to claim 1, wherein isolating verbs includes the stepof: decomposing sentences in the extracts into a group ofgrammatically-related phrases, such as “noun”, “adverb”, “adjective”,“verb” or “preposition”.
 5. A method for ranking creators of a set ofdocuments according to claim 1, including the step of: weightingextracts to favor those written in the first person over those writtenin the third person.
 6. A method for ranking creators according to claim1, wherein the set of documents is e-mail communications.
 7. A computerprogram executable to rank creators of a set of documents in order oftheir expertise in a subject according to the method of claim
 1. 8. Acomputer programmed to rank creators of a set of documents in order oftheir expertise in a subject according to the method of claim
 1. 9. Acomputer to rank creators of a set of documents in order of theirexpertise including means for: selecting documents from the set ofdocuments that refer to the subject to create a subject related subsetof documents; selecting extracts from the subset of documents that referto the subject; analyzing the linguistic structure of the extracts byisolating verbs in the extracts to create a set of verbs forclassification, and classifying each isolated verb in the set of verbsaccording to a predetermined hierarchy and using the analysis to rankthe creators.